Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Defeating line-noise CAPTCHAs with multiple quadratic snakes

Identifieur interne : 000204 ( Main/Exploration ); précédent : 000203; suivant : 000205

Defeating line-noise CAPTCHAs with multiple quadratic snakes

Auteurs : Yoichi Nakaguro [Thaïlande] ; Matthew N. Dailey [Thaïlande] ; Sanparith Marukatat [Thaïlande] ; Stanislav S. Makhanov [Thaïlande]

Source :

RBID : Pascal:13-0362945

Descripteurs français

English descriptors

Abstract

Optical character recognition (OCR) is one of the fundamental problems in artificial intelligence and image processing, but recent progress in OCR represents a security challenge for Web sites that throttle requests with image based CAPTCHAs (Completely Automated Public Turing Tests to Tell Computers and Humans Apart). A CAPTCHA is challenge-response test placed within web forms to determine whether the user is human. Unfortunately, algorithms capable of solving image based CAPTCHAs can be used to create spam accounts and design malicious denial of service (DoS) attacks, causing financial and social damage. The problem of defeating digital image CAPTCHAs is thus twofold. On the one hand, it is an important problem in artificial intelligence and image processing. On the other hand, publicly available CAPTCHAs that are not tested against state of the art machine recognition algorithms may make the systems vulnerable to attack by software bots. This paper considers a very important subclass of text CAPTCHAs, those characterized by salt and pepper noise combined with line (curve) noise. Thus far, attacks on CAPTCHAs with this type of noise have used relatively simple image processing methods with some success, but state-of-the-art segmentation methods have not been fully exploited. In this paper, we propose and benchmark two strong segmentation methods. The first method is a modification of a multiple quadratic snake proposed for road extraction from satellite images. The second competing method is a boundary tracing routine available in the OpenCV open source library. A first numerical experiment indicates excellent accuracy for both methods. A second experiment on human recognition shows that the CAPTCHAs used in the study are already near the threshold of being too hard for humans. Finally, a third numerical experiment presents a more difficult set of CAPTCHAs with the addition of anti-binarization methods. The snake-based method is shown to be more resilient to anti-binarization schemes than boundary tracing and state-of-the art projection-based attacks on CAPTCHAs. Since CAPTCHAs corrupted by small line noise are shown to be difficult for humans and relatively easy for our algorithm, CAPTCHA designers should introduce more challenging distortions into their CAPTCHAs, lest the security of systems based on them be compromised.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Defeating line-noise CAPTCHAs with multiple quadratic snakes</title>
<author>
<name sortKey="Nakaguro, Yoichi" sort="Nakaguro, Yoichi" uniqKey="Nakaguro Y" first="Yoichi" last="Nakaguro">Yoichi Nakaguro</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Sirindhorn International Institute of Technology, Thammasat University</s1>
<s3>THA</s3>
<sZ>1 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Thaïlande</country>
<wicri:noRegion>Sirindhorn International Institute of Technology, Thammasat University</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Dailey, Matthew N" sort="Dailey, Matthew N" uniqKey="Dailey M" first="Matthew N." last="Dailey">Matthew N. Dailey</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Computer Science and Information Management, Asian Institute of Technology</s1>
<s3>THA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Thaïlande</country>
<wicri:noRegion>Computer Science and Information Management, Asian Institute of Technology</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Marukatat, Sanparith" sort="Marukatat, Sanparith" uniqKey="Marukatat S" first="Sanparith" last="Marukatat">Sanparith Marukatat</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>National Electronics and Computer Technology Center</s1>
<s3>THA</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Thaïlande</country>
<wicri:noRegion>National Electronics and Computer Technology Center</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Makhanov, Stanislav S" sort="Makhanov, Stanislav S" uniqKey="Makhanov S" first="Stanislav S." last="Makhanov">Stanislav S. Makhanov</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Sirindhorn International Institute of Technology, Thammasat University</s1>
<s3>THA</s3>
<sZ>1 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Thaïlande</country>
<wicri:noRegion>Sirindhorn International Institute of Technology, Thammasat University</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">13-0362945</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0362945 INIST</idno>
<idno type="RBID">Pascal:13-0362945</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000038</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000730</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000047</idno>
<idno type="wicri:doubleKey">0167-4048:2013:Nakaguro Y:defeating:line:noise</idno>
<idno type="wicri:Area/Main/Merge">000207</idno>
<idno type="wicri:Area/Main/Curation">000204</idno>
<idno type="wicri:Area/Main/Exploration">000204</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Defeating line-noise CAPTCHAs with multiple quadratic snakes</title>
<author>
<name sortKey="Nakaguro, Yoichi" sort="Nakaguro, Yoichi" uniqKey="Nakaguro Y" first="Yoichi" last="Nakaguro">Yoichi Nakaguro</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Sirindhorn International Institute of Technology, Thammasat University</s1>
<s3>THA</s3>
<sZ>1 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Thaïlande</country>
<wicri:noRegion>Sirindhorn International Institute of Technology, Thammasat University</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Dailey, Matthew N" sort="Dailey, Matthew N" uniqKey="Dailey M" first="Matthew N." last="Dailey">Matthew N. Dailey</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Computer Science and Information Management, Asian Institute of Technology</s1>
<s3>THA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Thaïlande</country>
<wicri:noRegion>Computer Science and Information Management, Asian Institute of Technology</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Marukatat, Sanparith" sort="Marukatat, Sanparith" uniqKey="Marukatat S" first="Sanparith" last="Marukatat">Sanparith Marukatat</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>National Electronics and Computer Technology Center</s1>
<s3>THA</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Thaïlande</country>
<wicri:noRegion>National Electronics and Computer Technology Center</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Makhanov, Stanislav S" sort="Makhanov, Stanislav S" uniqKey="Makhanov S" first="Stanislav S." last="Makhanov">Stanislav S. Makhanov</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Sirindhorn International Institute of Technology, Thammasat University</s1>
<s3>THA</s3>
<sZ>1 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Thaïlande</country>
<wicri:noRegion>Sirindhorn International Institute of Technology, Thammasat University</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Computers & security</title>
<title level="j" type="abbreviated">Comput. secur.</title>
<idno type="ISSN">0167-4048</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Computers & security</title>
<title level="j" type="abbreviated">Comput. secur.</title>
<idno type="ISSN">0167-4048</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Active contour</term>
<term>Artificial intelligence</term>
<term>Authentication</term>
<term>Character recognition</term>
<term>Computer attack</term>
<term>Computer security</term>
<term>Denial of service</term>
<term>Digital image</term>
<term>Edge detection</term>
<term>Electronic mail</term>
<term>Image processing</term>
<term>Image segmentation</term>
<term>Internet</term>
<term>Optical character recognition</term>
<term>Pulse noise</term>
<term>Spam</term>
<term>Text</term>
<term>Tracing</term>
<term>User interface</term>
<term>Web site</term>
<term>World wide web</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Authentification</term>
<term>Interface utilisateur</term>
<term>Sécurité informatique</term>
<term>Détection contour</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Intelligence artificielle</term>
<term>Traitement image</term>
<term>Internet</term>
<term>Déni service</term>
<term>Image numérique</term>
<term>Attaque informatique</term>
<term>Site Web</term>
<term>Réseau web</term>
<term>Courriel</term>
<term>Bruit impulsion</term>
<term>Traçage</term>
<term>Contour actif</term>
<term>Texte</term>
<term>.</term>
<term>Segmentation image</term>
<term>Spam</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Intelligence artificielle</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Optical character recognition (OCR) is one of the fundamental problems in artificial intelligence and image processing, but recent progress in OCR represents a security challenge for Web sites that throttle requests with image based CAPTCHAs (Completely Automated Public Turing Tests to Tell Computers and Humans Apart). A CAPTCHA is challenge-response test placed within web forms to determine whether the user is human. Unfortunately, algorithms capable of solving image based CAPTCHAs can be used to create spam accounts and design malicious denial of service (DoS) attacks, causing financial and social damage. The problem of defeating digital image CAPTCHAs is thus twofold. On the one hand, it is an important problem in artificial intelligence and image processing. On the other hand, publicly available CAPTCHAs that are not tested against state of the art machine recognition algorithms may make the systems vulnerable to attack by software bots. This paper considers a very important subclass of text CAPTCHAs, those characterized by salt and pepper noise combined with line (curve) noise. Thus far, attacks on CAPTCHAs with this type of noise have used relatively simple image processing methods with some success, but state-of-the-art segmentation methods have not been fully exploited. In this paper, we propose and benchmark two strong segmentation methods. The first method is a modification of a multiple quadratic snake proposed for road extraction from satellite images. The second competing method is a boundary tracing routine available in the OpenCV open source library. A first numerical experiment indicates excellent accuracy for both methods. A second experiment on human recognition shows that the CAPTCHAs used in the study are already near the threshold of being too hard for humans. Finally, a third numerical experiment presents a more difficult set of CAPTCHAs with the addition of anti-binarization methods. The snake-based method is shown to be more resilient to anti-binarization schemes than boundary tracing and state-of-the art projection-based attacks on CAPTCHAs. Since CAPTCHAs corrupted by small line noise are shown to be difficult for humans and relatively easy for our algorithm, CAPTCHA designers should introduce more challenging distortions into their CAPTCHAs, lest the security of systems based on them be compromised.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Thaïlande</li>
</country>
</list>
<tree>
<country name="Thaïlande">
<noRegion>
<name sortKey="Nakaguro, Yoichi" sort="Nakaguro, Yoichi" uniqKey="Nakaguro Y" first="Yoichi" last="Nakaguro">Yoichi Nakaguro</name>
</noRegion>
<name sortKey="Dailey, Matthew N" sort="Dailey, Matthew N" uniqKey="Dailey M" first="Matthew N." last="Dailey">Matthew N. Dailey</name>
<name sortKey="Makhanov, Stanislav S" sort="Makhanov, Stanislav S" uniqKey="Makhanov S" first="Stanislav S." last="Makhanov">Stanislav S. Makhanov</name>
<name sortKey="Marukatat, Sanparith" sort="Marukatat, Sanparith" uniqKey="Marukatat S" first="Sanparith" last="Marukatat">Sanparith Marukatat</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000204 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000204 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:13-0362945
   |texte=   Defeating line-noise CAPTCHAs with multiple quadratic snakes
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024